# 16kHz Sampling Rate
Whisper Medium Cv11 German Ct2
Apache-2.0
Automatic speech recognition model fine-tuned on the Common Voice 11.0 German dataset based on OpenAI's whisper-medium model
Speech Recognition
Transformers German

W
mkenfenheuer
21
1
Whisper Medium Medical De AUT
German medical domain speech recognition model fine-tuned based on Whisper Medium architecture, specifically optimized for Austrian Standard German pronunciation
Speech Recognition
Transformers German

W
valhofec
20
2
Vits Eng
MIT
An English text-to-speech model based on the VITS architecture, trained by Kakao Enterprise, supporting high-quality speech synthesis
Speech Synthesis
Transformers English

V
BricksDisplay
28
4
Whisper Base Japanese
Apache-2.0
This model is fine-tuned on the Common Voice, JVS, and JSUT datasets for Japanese speech recognition tasks using openai/whisper-base.
Speech Recognition
Transformers Japanese

W
Ivydata
137
3
Whisper Large V2 Cv11 German
Apache-2.0
An automatic speech recognition model fine-tuned on the Common Voice 11.0 German dataset based on openai/whisper-large-v2, supporting German speech-to-text with a word error rate of 5.76
Speech Recognition
Transformers German

W
bofenghuang
179
16
Wav2vec2 Large Chinese Zh Cn
Apache-2.0
Chinese speech recognition model fine-tuned based on XLSR-53 large model, supporting 16kHz sampled audio input
Speech Recognition
Transformers Chinese

W
wbbbbb
585
40
Exp W2v2t Zh Cn Wavlm S596
Apache-2.0
A Chinese speech recognition model fine-tuned based on microsoft/wavlm-large, supporting Simplified Chinese, trained using the Common Voice 7.0 (zh-CN) dataset.
Speech Recognition
Transformers

E
jonatasgrosman
22
1
Exp W2v2t Fr Xls R S250
Apache-2.0
An automatic speech recognition model fine-tuned using the Common Voice 7.0 French dataset, based on the facebook/wav2vec2-xls-r-300m model
Speech Recognition
Transformers French

E
jonatasgrosman
20
0
Exp W2v2t Ja Vp It S544
Apache-2.0
A Japanese automatic speech recognition model fine-tuned using the training set of Common Voice 7.0 (Japanese version), based on the facebook/wav2vec2-large-it-voxpopuli model.
Speech Recognition
Transformers Japanese

E
jonatasgrosman
18
0
Exp W2v2t Ja Unispeech Sat S884
Apache-2.0
A Japanese automatic speech recognition model fine-tuned based on the microsoft/unispeech-sat-large model, trained using the Common Voice 7.0 Japanese dataset.
Speech Recognition
Transformers Japanese

E
jonatasgrosman
19
0
Exp W2v2t Ja Wavlm S729
Apache-2.0
A Japanese automatic speech recognition model fine-tuned based on microsoft/wavlm-large, trained using the Common Voice 7.0 Japanese dataset
Speech Recognition
Transformers Japanese

E
jonatasgrosman
15
2
Exp W2v2t Ja Unispeech S569
Apache-2.0
A Japanese automatic speech recognition model fine-tuned using the Common Voice 7.0 (Japanese) dataset, based on the microsoft/unispeech-large-1500h-cv model
Speech Recognition
Transformers Japanese

E
jonatasgrosman
14
0
Exp W2v2t Ja Xlsr 53 S109
Apache-2.0
Japanese automatic speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, trained using Common Voice 7.0 Japanese dataset
Speech Recognition
Transformers Japanese

E
jonatasgrosman
20
0
Exp W2v2t En Unispeech Sat S459
Apache-2.0
An English speech recognition model fine-tuned based on Microsoft's UniSpeech-SAT-Large model, supporting 16kHz sampled audio input.
Speech Recognition
Transformers English

E
jonatasgrosman
22
0
Wav2vec2 Large Tedlium
Apache-2.0
Wav2Vec2 large speech recognition model fine-tuned on the TEDLIUM corpus, supporting English speech-to-text conversion
Speech Recognition English
W
sanchit-gandhi
58
1
Wav2vec2 Large Xlsr 53 Chinese Zn Cn Aishell1
Apache-2.0
A Chinese speech recognition model fine-tuned on the AISHELL-1 dataset based on facebook/wav2vec2-large-xlsr-53, supporting Chinese speech recognition tasks.
Speech Recognition
Transformers Chinese

W
qinyue
22
6
Wav2vec2 Large Xlsr Persian V3
An automatic speech recognition (ASR) model fine-tuned on the Persian Common Voice dataset based on Facebook's wav2vec2-large-xlsr-53 model
Speech Recognition
Transformers Other

W
m3hrdadfi
1,888
37
Wav2vec2 Large Xlsr 53 German
Apache-2.0
An automatic speech recognition model fine-tuned on the Common Voice German dataset based on facebook/wav2vec2-large-xlsr-53, achieving a test WER of 15.80%.
Speech Recognition German
W
marcel
25
1
Wav2vec2 Base Timit Asr
Apache-2.0
A speech recognition model fine-tuned on the timit_asr dataset based on facebook/wav2vec2-base, supporting 16kHz sampled audio input
Speech Recognition
Transformers English

W
elgeish
174
0
Wav2vec Test
Apache-2.0
A fine-tuned Egyptian Arabic automatic speech recognition model based on facebook/wav2vec2-large-xlsr-53, trained using the arabicspeech.org MGB-3 dataset.
Speech Recognition
Transformers Arabic

W
othrif
27
0
Wav2vec2 Xls R 1b German
Apache-2.0
This is a German automatic speech recognition model based on the XLS-R 1B architecture, fine-tuned on multiple German speech datasets including Common Voice 8.0
Speech Recognition
Transformers German

W
jonatasgrosman
105
3
Wav2vec2 Large Xlsr Indonesian Artificial
Apache-2.0
This is an Indonesian speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, trained on the Common Voice Indonesian dataset.
Speech Recognition Other
W
cahya
22
0
Wav2vec2 Large Xlsr German Demo
Apache-2.0
A speech recognition model fine-tuned on the German Common Voice dataset based on facebook/wav2vec2-large-xlsr-53, with a word error rate of 29.35%
Speech Recognition German
W
marcel
23
1
Wav2vec2 Large Lv60 Timit
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-large-lv60, supporting 16kHz sampled speech input.
Speech Recognition English
W
harshit345
21
1
Wav2vec2 Xlsr 300m German Truecase
Based on Facebook's wav2vec2-xls-r-300m model, fine-tuned on the Common Voice German dataset, supporting German speech recognition with preserved text case information.
Speech Recognition
Transformers

W
abnerh
16
1
W2v Hf Jsut Xlsr53
Apache-2.0
A Japanese automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53 using the Common Voice and JSUT datasets.
Speech Recognition
Transformers Japanese

W
qqpann
16
1
Sepformer Whamr16k
Apache-2.0
This is an audio source separation model based on the SepFormer architecture, trained on the WHAMR! dataset, suitable for separating audio signals at a 16kHz sampling rate.
Sound Separation English
S
speechbrain
4,702
12
Wav2vec2 Large Xlsr 53 Chinese Zh Cn Gpt
Apache-2.0
A Chinese (zh-CN) speech recognition model fine-tuned on the Common Voice dataset based on facebook/wav2vec2-large-xlsr-53
Speech Recognition
Transformers Chinese

W
ydshieh
127
32
English Model
An English fine-tuned speech recognition model based on facebook/wav2vec2-large, using the Common Voice dataset, supporting 16kHz sampled audio input.
Speech Recognition
Transformers

E
tanmayplanet32
30
0
Wav2vec2 Large Xlsr 53 Hk
Apache-2.0
A speech recognition model fine-tuned on Cantonese (using the Common Voice dataset) based on facebook/wav2vec2-large-xlsr-53
Speech Recognition
Transformers

W
voidful
26
2
Convtasnet Libri2Mix Sepclean 16k
This is a ConvTasNet model trained based on the Asteroid framework, specifically designed for audio separation tasks, trained on the sep_clean task of the Libri2Mix dataset.
Sound Separation
C
JorisCos
13.38k
2
Wav2vec2 Large Japanese
Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supports 16kHz sampling rate input
Speech Recognition Japanese
W
NTQAI
316
7
Featured Recommended AI Models